Client Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
ثبت نشده
چکیده
Clustering is a group of objects that are similar among themselves but dissimilar to objects in other clusters. Clustering large dataset is a challenging task and the need for increase in scalability and performance formulates it to use parallelism. Though the use of Big Data has become very essential, analyzing it is demanding. This paper presents the (pC-PIC) parallel Client based Power Iteration clustering algorithm based on parallel PIC originated from PIC (Power Iteration Clustering). PIC performs clustering by embedding data points in a low dimensional data derived from the similarity matrix. In this paper we have proposed a client based algorithm pC-PIC that out performs the job done by the server and reduces its execution time. The experimental results show that pC-PIC can perform well for big data. It’s fast and scalable. The result also shows that the accuracy in producing the clusters is almost similar to the original algorithm. Hence the results produced by pC-PIC are fast, scalable and accurate.
منابع مشابه
Parallel Power Iteration Clustering for Big Data using MapReduce in Hadoop
In today’s life Distributed Data Mining is most popular topic in research area because as data are increasing in day to day life there are so many problems occurs to handle them and there are also a solutions for that but still they are not as per expectation, still there are some issue already there in the Distributed Data Mining, among them mainly we are focus in this papers that about reduci...
متن کاملCLUS: Parallel Subspace Clustering Algorithm on Spark
Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorithms is at most exponential with respect to the dimensionality of the dataset. In addition, datasets are generally too large to fit in a single machine under the current big data scenarios. The extremely high computati...
متن کاملImproved COA with Chaotic Initialization and Intelligent Migration for Data Clustering
A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...
متن کاملHigh-performance K-means Implementation based on a Coarse-grained Map-Reduce Architecture
The k-means algorithm is one of the most common clustering algorithms and widely used in data mining and pattern recognition. The increasing computational requirement of big data applications makes hardware acceleration for the kmeans algorithm necessary. In this paper, a coarse-grained Map-Reduce architecture is proposed to implement the kmeans algorithm on an FPGA. Algorithmic segmentation, d...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کامل